0 O ct 2 01 7 Efficient Dynamic Dictionary Matching with DAWGs and AC - automata

نویسندگان

  • Shunsuke Inenaga
  • Ryo Yoshinaka
  • Ayumi Shinohara
چکیده

The dictionary matching is a task to find all occurrences of pattern strings in a set D (called a dictionary) on a text string T . The Aho-Corasick-automaton (AC-automaton) which is built on D is a fundamental data structure which enables us to solve the dictionary matching problem in O(d log σ) preprocessing time and O(n log σ + occ) matching time, where d is the total length of the patterns in the dictionary D, n is the length of the text, σ is the alphabet size, and occ is the total number of occurrences of all the patterns in the text. The dynamic dictionary matching is a variant where patterns may dynamically be inserted into and deleted from the dictionary D. This problem is called semi-dynamic dictionary matching if only insertions are allowed. In this paper, we propose two efficient algorithms that can solve both problems with some modifications. For a pattern of length m, our first algorithm supports insertions in O(m log σ + log d/ log log d) time and pattern matching in O(n log σ + occ) for the semi-dynamic setting. This algorithm also supports both insertions and deletions in O(σm + log d/ log log d) time and pattern matching in O(n(log d/ log log d+ log σ) + occ(log d/ log log d)) time for the dynamic dictionary matching problem by some modifications. This algorithm is based on the directed acyclic word graph (DAWG) of Blumer et al. (JACM 1987). Our second algorithm, which is based on the AC-automaton, supports insertions in O(m log σ+uf+uo) time for the semi-dynamic setting and supports both insertions and deletions in O(σm + uf + uo) time for the dynamic setting, where uf and uo respectively denote the numbers of states of which the failure function and the output function need to be updated. This algorithm performs pattern matching in O(n log σ+occ) time for both settings. Our algorithm achieves optimal update time for AC-automaton based methods, since any algorithm which explicitly maintains the AC-automaton requires Ω(uf + uo) update time. Keywords— dynamic dictionary matching, AC-automaton, DAWG

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Dynamic Dictionary Matching with DAWGs and AC-automata

The dictionary matching is a task to find all occurrences of pattern strings in a set D (called a dictionary) on a text string T . The Aho-Corasick-automaton (AC-automaton) which is built on D is a fundamental data structure which enables us to solve the dictionary matching problem in O(d log σ) preprocessing time and O(n log σ + occ) matching time, where d is the total length of the patterns i...

متن کامل

Approximate String Matching by Finite Automata

Abs t r ac t . Approximate string matching is a sequential problem and therefore it is possible to solve it using finite automata. A nondeterministic finite automaton is constructed for string matching with k mismatches. It is shown, how "dynamic programming" and "shift-and" based algorithms simulate this nondeterministic finite automaton. The corresponding deterministic finite automaton have O...

متن کامل

ar X iv : 1 71 0 . 08 99 6 v 1 [ cs . L O ] 2 4 O ct 2 01 7 Permutation Games for the Weakly Aconjunctive

We introduce a natural notion of limit-deterministic parity automata and present a method that uses such automata to construct satisfiability games for the weakly aconjunctive fragment of the μ-calculus. To this end we devise a method that determinizes limit-deterministic parity automata of size n with k priorities through limit-deterministic Büchi automata to deterministic parity automata of s...

متن کامل

ar X iv : 1 61 0 . 01 68 7 v 1 [ cs . G T ] 5 O ct 2 01 6 Sampled Fictitious Play is Hannan Consistent

Fictitious play is a simple and widely studied adaptive heuristic for playing repeated games. It is well known that fictitious play fails to be Hannan consistent. Several variants of fictitious play including regret matching, generalized regret matching and smooth fictitious play, are known to be Hannan consistent. In this note, we consider sampled fictitious play: at each round, the player sam...

متن کامل

Succinct 2D Dictionary Matching with No Slowdown

The dictionary matching problem seeks all locations in a given text that match any of the patterns in a given dictionary. Efficient algorithms for dictionary matching scan the text once, searching for all patterns simultaneously. This paper presents the first 2-dimensional dictionary matching algorithm that operates in small space and linear time. Given d patterns, D = {P1, . . . , Pd}, each of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017